Overview
Brought to you by YData
Dataset statistics
| Number of variables | 18 |
|---|---|
| Number of observations | 2820232 |
| Missing cells | 546560 |
| Missing cells (%) | 1.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.3 GiB |
| Average record size in memory | 495.3 B |
Variable types
| Text | 2 |
|---|---|
| Numeric | 11 |
| Boolean | 1 |
| Categorical | 4 |
POSSIBLENterm has constant value "True" | Constant |
Insidesource has constant value "TMHMM2.0" | Constant |
TMhelixsource has constant value "TMHMM2.0" | Constant |
Outsidesource has constant value "TMHMM2.0" | Constant |
ExpnumberofAAsinTMHs is highly overall correlated with Insideend and 5 other fields | High correlation |
Insideend is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fields | High correlation |
Insidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fields | High correlation |
Length is highly overall correlated with Insideend and 1 other fields | High correlation |
Outsideend is highly overall correlated with Length and 3 other fields | High correlation |
Outsidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fields | High correlation |
PredictedTMHsNumber is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fields | High correlation |
TMhelixend is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fields | High correlation |
TMhelixstart is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fields | High correlation |
POSSIBLENterm has 546560 (19.4%) missing values | Missing |
Protein_ID has unique values | Unique |
Expnumberfirst60AAs has 143860 (5.1%) zeros | Zeros |
Reproduction
| Analysis started | 2025-07-10 08:39:03.908468 |
|---|---|
| Analysis finished | 2025-07-10 08:41:22.796491 |
| Duration | 2 minutes and 18.89 seconds |
| Software version | ydata-profiling v4.16.1 |
| Download configuration | config.json |
Variables
Phage_ID
Text
| Distinct | 589382 |
|---|---|
| Distinct (%) | 20.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 195.6 MiB |
Length
| Max length | 88 |
|---|---|
| Median length | 87 |
| Mean length | 23.728165 |
| Min length | 5 |
Unique
| Unique | 107527 ? |
|---|---|
| Unique (%) | 3.8% |
Sample
| 1st row | NC_001330.1 |
|---|---|
| 2nd row | NC_001331.1 |
| 3rd row | NC_001331.1 |
| 4th row | NC_001331.1 |
| 5th row | NC_001331.1 |
| Value | Count | Frequency (%) |
| samn01774283_a1_ct717 | 118 | < 0.1% |
| uvig_134152 | 104 | < 0.1% |
| samn01773488_b1_ct3 | 101 | < 0.1% |
| uvig_20542 | 97 | < 0.1% |
| mgv-genome-0380253 | 96 | < 0.1% |
| mgv-genome-0380244 | 95 | < 0.1% |
| uvig_544214 | 91 | < 0.1% |
| mgv-genome-0380194 | 88 | < 0.1% |
| mgv-genome-0380160 | 88 | < 0.1% |
| mgv-genome-0380240 | 87 | < 0.1% |
| Other values (589372) | 2819267 |
Most occurring characters
| Value | Count | Frequency (%) |
| _ | 6419020 | 9.6% |
| 1 | 3451421 | 5.2% |
| 0 | 2972563 | 4.4% |
| 3 | 2820088 | 4.2% |
| 2 | 2793748 | 4.2% |
| E | 2333405 | 3.5% |
| 4 | 2286551 | 3.4% |
| 5 | 2254271 | 3.4% |
| M | 2126961 | 3.2% |
| 7 | 2090878 | 3.1% |
| Other values (55) | 37370023 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 66918929 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| _ | 6419020 | 9.6% |
| 1 | 3451421 | 5.2% |
| 0 | 2972563 | 4.4% |
| 3 | 2820088 | 4.2% |
| 2 | 2793748 | 4.2% |
| E | 2333405 | 3.5% |
| 4 | 2286551 | 3.4% |
| 5 | 2254271 | 3.4% |
| M | 2126961 | 3.2% |
| 7 | 2090878 | 3.1% |
| Other values (55) | 37370023 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 66918929 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| _ | 6419020 | 9.6% |
| 1 | 3451421 | 5.2% |
| 0 | 2972563 | 4.4% |
| 3 | 2820088 | 4.2% |
| 2 | 2793748 | 4.2% |
| E | 2333405 | 3.5% |
| 4 | 2286551 | 3.4% |
| 5 | 2254271 | 3.4% |
| M | 2126961 | 3.2% |
| 7 | 2090878 | 3.1% |
| Other values (55) | 37370023 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 66918929 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| _ | 6419020 | 9.6% |
| 1 | 3451421 | 5.2% |
| 0 | 2972563 | 4.4% |
| 3 | 2820088 | 4.2% |
| 2 | 2793748 | 4.2% |
| E | 2333405 | 3.5% |
| 4 | 2286551 | 3.4% |
| 5 | 2254271 | 3.4% |
| M | 2126961 | 3.2% |
| 7 | 2090878 | 3.1% |
| Other values (55) | 37370023 |
Protein_ID
Text
Unique 
| Distinct | 2820232 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 203.2 MiB |
Length
| Max length | 91 |
|---|---|
| Median length | 89 |
| Mean length | 26.540974 |
| Min length | 7 |
Unique
| Unique | 2820232 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | NP_039595.1 |
|---|---|
| 2nd row | NP_039601.1 |
| 3rd row | NP_039602.1 |
| 4th row | NP_039603.1 |
| 5th row | NP_039604.1 |
| Value | Count | Frequency (%) |
| np_039699.1 | 1 | < 0.1% |
| biochar_6180_5 | 1 | < 0.1% |
| np_039595.1 | 1 | < 0.1% |
| np_039601.1 | 1 | < 0.1% |
| np_039602.1 | 1 | < 0.1% |
| np_039603.1 | 1 | < 0.1% |
| np_039604.1 | 1 | < 0.1% |
| np_039606.1 | 1 | < 0.1% |
| biochar_6126_21 | 1 | < 0.1% |
| biochar_6133_4 | 1 | < 0.1% |
| Other values (2820222) | 2820222 |
Most occurring characters
| Value | Count | Frequency (%) |
| _ | 9172554 | 12.3% |
| 1 | 4398427 | 5.9% |
| 2 | 3533107 | 4.7% |
| 3 | 3474835 | 4.6% |
| 0 | 3294552 | 4.4% |
| 4 | 2842511 | 3.8% |
| 5 | 2735874 | 3.7% |
| 7 | 2458803 | 3.3% |
| 6 | 2447491 | 3.3% |
| E | 2335532 | 3.1% |
| Other values (55) | 38158017 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 74851703 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| _ | 9172554 | 12.3% |
| 1 | 4398427 | 5.9% |
| 2 | 3533107 | 4.7% |
| 3 | 3474835 | 4.6% |
| 0 | 3294552 | 4.4% |
| 4 | 2842511 | 3.8% |
| 5 | 2735874 | 3.7% |
| 7 | 2458803 | 3.3% |
| 6 | 2447491 | 3.3% |
| E | 2335532 | 3.1% |
| Other values (55) | 38158017 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 74851703 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| _ | 9172554 | 12.3% |
| 1 | 4398427 | 5.9% |
| 2 | 3533107 | 4.7% |
| 3 | 3474835 | 4.6% |
| 0 | 3294552 | 4.4% |
| 4 | 2842511 | 3.8% |
| 5 | 2735874 | 3.7% |
| 7 | 2458803 | 3.3% |
| 6 | 2447491 | 3.3% |
| E | 2335532 | 3.1% |
| Other values (55) | 38158017 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 74851703 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| _ | 9172554 | 12.3% |
| 1 | 4398427 | 5.9% |
| 2 | 3533107 | 4.7% |
| 3 | 3474835 | 4.6% |
| 0 | 3294552 | 4.4% |
| 4 | 2842511 | 3.8% |
| 5 | 2735874 | 3.7% |
| 7 | 2458803 | 3.3% |
| 6 | 2447491 | 3.3% |
| E | 2335532 | 3.1% |
| Other values (55) | 38158017 |
Length
Real number (ℝ)
High correlation 
| Distinct | 3523 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 220.91134 |
| Minimum | 18 |
|---|---|
| Maximum | 13719 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.5 MiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 47 |
| Q1 | 81 |
| median | 129 |
| Q3 | 218 |
| 95-th percentile | 788 |
| Maximum | 13719 |
| Range | 13701 |
| Interquartile range (IQR) | 137 |
Descriptive statistics
| Standard deviation | 291.51083 |
|---|---|
| Coefficient of variation (CV) | 1.3195829 |
| Kurtosis | 46.660059 |
| Mean | 220.91134 |
| Median Absolute Deviation (MAD) | 58 |
| Skewness | 4.8169991 |
| Sum | 6.2302122 × 108 |
| Variance | 84978.562 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 71 | 21415 | 0.8% |
| 66 | 20631 | 0.7% |
| 68 | 20549 | 0.7% |
| 60 | 20340 | 0.7% |
| 67 | 19434 | 0.7% |
| 70 | 19106 | 0.7% |
| 55 | 18774 | 0.7% |
| 77 | 18237 | 0.6% |
| 93 | 18037 | 0.6% |
| 99 | 17882 | 0.6% |
| Other values (3513) | 2625827 |
| Value | Count | Frequency (%) |
| 18 | 1 | < 0.1% |
| 20 | 27 | < 0.1% |
| 21 | 35 | < 0.1% |
| 22 | 52 | < 0.1% |
| 23 | 79 | < 0.1% |
| 24 | 137 | < 0.1% |
| 25 | 166 | < 0.1% |
| 26 | 202 | |
| 27 | 306 | |
| 28 | 458 |
| Value | Count | Frequency (%) |
| 13719 | 1 | |
| 13380 | 1 | |
| 9455 | 1 | |
| 9097 | 1 | |
| 8731 | 1 | |
| 8300 | 1 | |
| 7972 | 1 | |
| 7748 | 1 | |
| 7700 | 2 | |
| 7699 | 2 |
PredictedTMHsNumber
Real number (ℝ)
High correlation 
| Distinct | 34 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.8934964 |
| Minimum | 1 |
|---|---|
| Maximum | 48 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.5 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 5 |
| Maximum | 48 |
| Range | 47 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.8690251 |
|---|---|
| Coefficient of variation (CV) | 0.98707613 |
| Kurtosis | 27.54299 |
| Mean | 1.8934964 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.4129981 |
| Sum | 5340099 |
| Variance | 3.4932546 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 1670464 | |
| 2 | 676617 | |
| 3 | 209168 | 7.4% |
| 4 | 108505 | 3.8% |
| 5 | 39806 | 1.4% |
| 6 | 33762 | 1.2% |
| 10 | 15570 | 0.6% |
| 7 | 14218 | 0.5% |
| 8 | 12937 | 0.5% |
| 12 | 9254 | 0.3% |
| Other values (24) | 29931 | 1.1% |
| Value | Count | Frequency (%) |
| 1 | 1670464 | |
| 2 | 676617 | |
| 3 | 209168 | 7.4% |
| 4 | 108505 | 3.8% |
| 5 | 39806 | 1.4% |
| 6 | 33762 | 1.2% |
| 7 | 14218 | 0.5% |
| 8 | 12937 | 0.5% |
| 9 | 7808 | 0.3% |
| 10 | 15570 | 0.6% |
| Value | Count | Frequency (%) |
| 48 | 1 | < 0.1% |
| 36 | 2 | < 0.1% |
| 34 | 2 | < 0.1% |
| 32 | 16 | < 0.1% |
| 30 | 13 | < 0.1% |
| 29 | 6 | < 0.1% |
| 28 | 30 | < 0.1% |
| 27 | 6 | < 0.1% |
| 26 | 103 | |
| 25 | 13 | < 0.1% |
ExpnumberofAAsinTMHs
Real number (ℝ)
High correlation 
| Distinct | 983818 |
|---|---|
| Distinct (%) | 34.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 41.850682 |
| Minimum | 6.44741 |
|---|---|
| Maximum | 1029.8286 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.5 MiB |
Quantile statistics
| Minimum | 6.44741 |
|---|---|
| 5-th percentile | 17.44989 |
| Q1 | 20.85793 |
| median | 23.14312 |
| Q3 | 44.29862 |
| 95-th percentile | 112.84403 |
| Maximum | 1029.8286 |
| Range | 1023.3812 |
| Interquartile range (IQR) | 23.44069 |
Descriptive statistics
| Standard deviation | 43.234392 |
|---|---|
| Coefficient of variation (CV) | 1.033063 |
| Kurtosis | 27.438665 |
| Mean | 41.850682 |
| Median Absolute Deviation (MAD) | 5.379675 |
| Skewness | 4.3714055 |
| Sum | 1.1802863 × 108 |
| Variance | 1869.2127 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 18.23661 | 3989 | 0.1% |
| 24.87583 | 3985 | 0.1% |
| 36.04048 | 2768 | 0.1% |
| 210.43458 | 2391 | 0.1% |
| 37.75642 | 2117 | 0.1% |
| 47.86547 | 1952 | 0.1% |
| 108.33627 | 1755 | 0.1% |
| 22.0628 | 1703 | 0.1% |
| 22.05877 | 1691 | 0.1% |
| 20.65344 | 1463 | 0.1% |
| Other values (983808) | 2796418 |
| Value | Count | Frequency (%) |
| 6.44741 | 2 | |
| 6.55333 | 3 | |
| 6.71723 | 1 | < 0.1% |
| 6.76893 | 1 | < 0.1% |
| 6.83648 | 1 | < 0.1% |
| 6.87749 | 1 | < 0.1% |
| 7.01696 | 1 | < 0.1% |
| 7.10047 | 1 | < 0.1% |
| 7.16215 | 1 | < 0.1% |
| 7.16231 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1029.82865 | 1 | |
| 793.04047 | 1 | |
| 782.99323 | 1 | |
| 777.33773 | 1 | |
| 775.5699 | 1 | |
| 750.08628 | 1 | |
| 740.20257 | 1 | |
| 740.12692 | 1 | |
| 739.44781 | 1 | |
| 725.55356 | 1 |
Expnumberfirst60AAs
Real number (ℝ)
Zeros 
| Distinct | 786820 |
|---|---|
| Distinct (%) | 27.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20.514973 |
| Minimum | 0 |
|---|---|
| Maximum | 52.88279 |
| Zeros | 143860 |
| Zeros (%) | 5.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 16.436062 |
| median | 21.199435 |
| Q3 | 25.097315 |
| 95-th percentile | 41.50342 |
| Maximum | 52.88279 |
| Range | 52.88279 |
| Interquartile range (IQR) | 8.6612525 |
Descriptive statistics
| Standard deviation | 12.212541 |
|---|---|
| Coefficient of variation (CV) | 0.5952989 |
| Kurtosis | -0.45859958 |
| Mean | 20.514973 |
| Median Absolute Deviation (MAD) | 4.443205 |
| Skewness | -0.10037227 |
| Sum | 57856984 |
| Variance | 149.14616 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 143860 | 5.1% |
| 0.00018 | 5640 | 0.2% |
| 18.23661 | 3989 | 0.1% |
| 24.87583 | 3985 | 0.1% |
| 42.15085 | 3664 | 0.1% |
| 0.00019 | 3210 | 0.1% |
| 0.0002 | 3166 | 0.1% |
| 36.04048 | 2768 | 0.1% |
| 1 × 10-5 | 2592 | 0.1% |
| 0.00017 | 2284 | 0.1% |
| Other values (786810) | 2645074 |
| Value | Count | Frequency (%) |
| 0 | 143860 | |
| 1 × 10-5 | 2592 | 0.1% |
| 2 × 10-5 | 1341 | < 0.1% |
| 3 × 10-5 | 1343 | < 0.1% |
| 4 × 10-5 | 900 | < 0.1% |
| 5 × 10-5 | 768 | < 0.1% |
| 6 × 10-5 | 927 | < 0.1% |
| 7 × 10-5 | 794 | < 0.1% |
| 8 × 10-5 | 1161 | < 0.1% |
| 9 × 10-5 | 1054 | < 0.1% |
| Value | Count | Frequency (%) |
| 52.88279 | 2 | < 0.1% |
| 52.56452 | 1 | < 0.1% |
| 52.4412 | 6 | |
| 52.39016 | 1 | < 0.1% |
| 52.34317 | 1 | < 0.1% |
| 52.10504 | 1 | < 0.1% |
| 51.57802 | 1 | < 0.1% |
| 51.56347 | 1 | < 0.1% |
| 51.33831 | 1 | < 0.1% |
| 51.33708 | 2 | < 0.1% |
TotalprobofNin
Real number (ℝ)
| Distinct | 99923 |
|---|---|
| Distinct (%) | 3.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.58968644 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 157 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.02219 |
| Q1 | 0.23569 |
| median | 0.69128 |
| Q3 | 0.92597 |
| 95-th percentile | 0.99604 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.69028 |
Descriptive statistics
| Standard deviation | 0.35280531 |
|---|---|
| Coefficient of variation (CV) | 0.59829306 |
| Kurtosis | -1.4057218 |
| Mean | 0.58968644 |
| Median Absolute Deviation (MAD) | 0.27981 |
| Skewness | -0.37835311 |
| Sum | 1663052.6 |
| Variance | 0.12447158 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.59265 | 3993 | 0.1% |
| 0.56701 | 3988 | 0.1% |
| 0.99854 | 3981 | 0.1% |
| 0.03881 | 3512 | 0.1% |
| 0.95017 | 3014 | 0.1% |
| 0.86194 | 2792 | 0.1% |
| 0.28216 | 1964 | 0.1% |
| 0.99957 | 1898 | 0.1% |
| 0.99602 | 1846 | 0.1% |
| 0.99959 | 1839 | 0.1% |
| Other values (99913) | 2791405 |
| Value | Count | Frequency (%) |
| 0 | 157 | < 0.1% |
| 1 × 10-5 | 286 | |
| 2 × 10-5 | 318 | |
| 3 × 10-5 | 252 | |
| 4 × 10-5 | 389 | |
| 5 × 10-5 | 238 | < 0.1% |
| 6 × 10-5 | 625 | |
| 7 × 10-5 | 176 | < 0.1% |
| 8 × 10-5 | 155 | < 0.1% |
| 9 × 10-5 | 95 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 591 | |
| 0.99999 | 987 | |
| 0.99998 | 945 | |
| 0.99997 | 948 | |
| 0.99996 | 860 | |
| 0.99995 | 1134 | |
| 0.99994 | 634 | |
| 0.99993 | 726 | |
| 0.99992 | 669 | |
| 0.99991 | 621 |
POSSIBLENterm
Boolean
Constant  Missing 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 546560 |
| Missing (%) | 19.4% |
| Memory size | 94.7 MiB |
| True | |
|---|---|
| (Missing) |
| Value | Count | Frequency (%) |
| True | 2273672 | |
| (Missing) | 546560 | 19.4% |
Insidesource
Categorical
Constant 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 153.3 MiB |
| TMHMM2.0 |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | TMHMM2.0 |
|---|---|
| 2nd row | TMHMM2.0 |
| 3rd row | TMHMM2.0 |
| 4th row | TMHMM2.0 |
| 5th row | TMHMM2.0 |
Common Values
| Value | Count | Frequency (%) |
| TMHMM2.0 | 2820232 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| tmhmm2.0 | 2820232 |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 22561856 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 22561856 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 22561856 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
Insidestart
Real number (ℝ)
High correlation 
| Distinct | 2649 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 93.334258 |
| Minimum | 1 |
|---|---|
| Maximum | 13249 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.5 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 33 |
| Q3 | 86 |
| 95-th percentile | 465 |
| Maximum | 13249 |
| Range | 13248 |
| Interquartile range (IQR) | 85 |
Descriptive statistics
| Standard deviation | 195.32107 |
|---|---|
| Coefficient of variation (CV) | 2.092705 |
| Kurtosis | 133.92739 |
| Mean | 93.334258 |
| Median Absolute Deviation (MAD) | 32 |
| Skewness | 7.1304808 |
| Sum | 2.6322426 × 108 |
| Variance | 38150.32 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 935612 | |
| 27 | 101890 | 3.6% |
| 28 | 92965 | 3.3% |
| 33 | 74995 | 2.7% |
| 38 | 62908 | 2.2% |
| 24 | 58822 | 2.1% |
| 22 | 46873 | 1.7% |
| 25 | 37458 | 1.3% |
| 43 | 34044 | 1.2% |
| 23 | 26591 | 0.9% |
| Other values (2639) | 1348074 |
| Value | Count | Frequency (%) |
| 1 | 935612 | |
| 19 | 1105 | < 0.1% |
| 20 | 1683 | 0.1% |
| 21 | 892 | < 0.1% |
| 22 | 46873 | 1.7% |
| 23 | 26591 | 0.9% |
| 24 | 58822 | 2.1% |
| 25 | 37458 | 1.3% |
| 26 | 9835 | 0.3% |
| 27 | 101890 | 3.6% |
| Value | Count | Frequency (%) |
| 13249 | 1 | < 0.1% |
| 7965 | 1 | < 0.1% |
| 7680 | 2 | < 0.1% |
| 7679 | 2 | < 0.1% |
| 7675 | 4 | < 0.1% |
| 7674 | 13 | |
| 7673 | 6 | |
| 7672 | 2 | < 0.1% |
| 7661 | 1 | < 0.1% |
| 7561 | 1 | < 0.1% |
Insideend
Real number (ℝ)
High correlation 
| Distinct | 2719 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 139.64195 |
| Minimum | 1 |
|---|---|
| Maximum | 13380 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.5 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 36 |
| median | 86 |
| Q3 | 152 |
| 95-th percentile | 523 |
| Maximum | 13380 |
| Range | 13379 |
| Interquartile range (IQR) | 116 |
Descriptive statistics
| Standard deviation | 208.04787 |
|---|---|
| Coefficient of variation (CV) | 1.4898665 |
| Kurtosis | 106.17795 |
| Mean | 139.64195 |
| Median Absolute Deviation (MAD) | 56 |
| Skewness | 6.2603977 |
| Sum | 3.9382271 × 108 |
| Variance | 43283.918 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6 | 217509 | 7.7% |
| 4 | 76991 | 2.7% |
| 12 | 75715 | 2.7% |
| 11 | 51710 | 1.8% |
| 20 | 48564 | 1.7% |
| 19 | 29770 | 1.1% |
| 8 | 24743 | 0.9% |
| 67 | 21399 | 0.8% |
| 1 | 20753 | 0.7% |
| 55 | 20245 | 0.7% |
| Other values (2709) | 2232833 |
| Value | Count | Frequency (%) |
| 1 | 20753 | 0.7% |
| 2 | 2875 | 0.1% |
| 4 | 76991 | 2.7% |
| 6 | 217509 | |
| 8 | 24743 | 0.9% |
| 10 | 1894 | 0.1% |
| 11 | 51710 | 1.8% |
| 12 | 75715 | 2.7% |
| 15 | 3668 | 0.1% |
| 16 | 9678 | 0.3% |
| Value | Count | Frequency (%) |
| 13380 | 1 | < 0.1% |
| 7972 | 1 | < 0.1% |
| 7700 | 2 | < 0.1% |
| 7699 | 2 | < 0.1% |
| 7695 | 4 | < 0.1% |
| 7694 | 13 | |
| 7693 | 6 | |
| 7692 | 2 | < 0.1% |
| 7681 | 1 | < 0.1% |
| 7570 | 1 | < 0.1% |
TMhelixsource
Categorical
Constant 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 153.3 MiB |
| TMHMM2.0 |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | TMHMM2.0 |
|---|---|
| 2nd row | TMHMM2.0 |
| 3rd row | TMHMM2.0 |
| 4th row | TMHMM2.0 |
| 5th row | TMHMM2.0 |
Common Values
| Value | Count | Frequency (%) |
| TMHMM2.0 | 2820232 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| tmhmm2.0 | 2820232 |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 22561856 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 22561856 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 22561856 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
TMhelixstart
Real number (ℝ)
High correlation 
| Distinct | 2653 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 97.298779 |
| Minimum | 2 |
|---|---|
| Maximum | 13226 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.5 MiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 10 |
| median | 36 |
| Q3 | 90 |
| 95-th percentile | 471 |
| Maximum | 13226 |
| Range | 13224 |
| Interquartile range (IQR) | 80 |
Descriptive statistics
| Standard deviation | 196.28246 |
|---|---|
| Coefficient of variation (CV) | 2.0173168 |
| Kurtosis | 129.90122 |
| Mean | 97.298779 |
| Median Absolute Deviation (MAD) | 29 |
| Skewness | 7.0253554 |
| Sum | 2.7440513 × 108 |
| Variance | 38526.804 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7 | 218079 | 7.7% |
| 5 | 197680 | 7.0% |
| 4 | 188617 | 6.7% |
| 10 | 98390 | 3.5% |
| 13 | 80756 | 2.9% |
| 15 | 67965 | 2.4% |
| 20 | 59910 | 2.1% |
| 12 | 51934 | 1.8% |
| 21 | 48571 | 1.7% |
| 39 | 34572 | 1.2% |
| Other values (2643) | 1773758 |
| Value | Count | Frequency (%) |
| 2 | 20753 | 0.7% |
| 3 | 2875 | 0.1% |
| 4 | 188617 | |
| 5 | 197680 | |
| 6 | 15436 | 0.5% |
| 7 | 218079 | |
| 9 | 24805 | 0.9% |
| 10 | 98390 | |
| 11 | 5227 | 0.2% |
| 12 | 51934 | 1.8% |
| Value | Count | Frequency (%) |
| 13226 | 1 | < 0.1% |
| 7943 | 1 | < 0.1% |
| 7657 | 2 | < 0.1% |
| 7656 | 2 | < 0.1% |
| 7652 | 4 | < 0.1% |
| 7651 | 13 | |
| 7650 | 6 | |
| 7649 | 2 | < 0.1% |
| 7638 | 1 | < 0.1% |
| 7538 | 1 | < 0.1% |
TMhelixend
Real number (ℝ)
High correlation 
| Distinct | 2673 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 118.14131 |
| Minimum | 16 |
|---|---|
| Maximum | 13248 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.5 MiB |
Quantile statistics
| Minimum | 16 |
|---|---|
| 5-th percentile | 24 |
| Q1 | 31 |
| median | 57 |
| Q3 | 112 |
| 95-th percentile | 491 |
| Maximum | 13248 |
| Range | 13232 |
| Interquartile range (IQR) | 81 |
Descriptive statistics
| Standard deviation | 196.51164 |
|---|---|
| Coefficient of variation (CV) | 1.663361 |
| Kurtosis | 129.39934 |
| Mean | 118.14131 |
| Median Absolute Deviation (MAD) | 30 |
| Skewness | 7.0105449 |
| Sum | 3.3318589 × 108 |
| Variance | 38616.826 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 29 | 155069 | 5.5% |
| 26 | 136386 | 4.8% |
| 27 | 133669 | 4.7% |
| 24 | 91901 | 3.3% |
| 32 | 74386 | 2.6% |
| 35 | 60142 | 2.1% |
| 23 | 52527 | 1.9% |
| 37 | 51684 | 1.8% |
| 42 | 47586 | 1.7% |
| 34 | 46858 | 1.7% |
| Other values (2663) | 1970024 |
| Value | Count | Frequency (%) |
| 16 | 6 | < 0.1% |
| 17 | 24 | < 0.1% |
| 18 | 797 | < 0.1% |
| 19 | 3976 | 0.1% |
| 20 | 2123 | 0.1% |
| 21 | 39480 | |
| 22 | 34255 | 1.2% |
| 23 | 52527 | |
| 24 | 91901 | |
| 25 | 19524 | 0.7% |
| Value | Count | Frequency (%) |
| 13248 | 1 | < 0.1% |
| 7964 | 1 | < 0.1% |
| 7679 | 2 | < 0.1% |
| 7678 | 2 | < 0.1% |
| 7674 | 4 | < 0.1% |
| 7673 | 13 | |
| 7672 | 6 | |
| 7671 | 2 | < 0.1% |
| 7660 | 1 | < 0.1% |
| 7560 | 1 | < 0.1% |
Outsidesource
Categorical
Constant 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 153.3 MiB |
| TMHMM2.0 |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | TMHMM2.0 |
|---|---|
| 2nd row | TMHMM2.0 |
| 3rd row | TMHMM2.0 |
| 4th row | TMHMM2.0 |
| 5th row | TMHMM2.0 |
Common Values
| Value | Count | Frequency (%) |
| TMHMM2.0 | 2820232 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| tmhmm2.0 | 2820232 |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 22561856 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 22561856 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 22561856 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| M | 8460696 | |
| T | 2820232 | 12.5% |
| H | 2820232 | 12.5% |
| 2 | 2820232 | 12.5% |
| . | 2820232 | 12.5% |
| 0 | 2820232 | 12.5% |
Outsidestart
Real number (ℝ)
High correlation 
| Distinct | 2229 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 89.368286 |
| Minimum | 1 |
|---|---|
| Maximum | 6719 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.5 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 35 |
| Q3 | 85 |
| 95-th percentile | 425 |
| Maximum | 6719 |
| Range | 6718 |
| Interquartile range (IQR) | 84 |
Descriptive statistics
| Standard deviation | 171.95096 |
|---|---|
| Coefficient of variation (CV) | 1.9240713 |
| Kurtosis | 38.949965 |
| Mean | 89.368286 |
| Median Absolute Deviation (MAD) | 34 |
| Skewness | 4.8009472 |
| Sum | 2.520393 × 108 |
| Variance | 29567.132 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 734852 | |
| 30 | 190705 | 6.8% |
| 25 | 99846 | 3.5% |
| 36 | 94404 | 3.3% |
| 28 | 77838 | 2.8% |
| 27 | 75507 | 2.7% |
| 44 | 60261 | 2.1% |
| 35 | 57451 | 2.0% |
| 32 | 44830 | 1.6% |
| 43 | 34560 | 1.2% |
| Other values (2219) | 1349978 |
| Value | Count | Frequency (%) |
| 1 | 734852 | |
| 17 | 300 | < 0.1% |
| 18 | 128 | < 0.1% |
| 19 | 382 | < 0.1% |
| 20 | 9969 | 0.4% |
| 21 | 4618 | 0.2% |
| 22 | 10260 | 0.4% |
| 23 | 27778 | 1.0% |
| 24 | 6487 | 0.2% |
| 25 | 99846 | 3.5% |
| Value | Count | Frequency (%) |
| 6719 | 1 | < 0.1% |
| 6658 | 1 | < 0.1% |
| 5856 | 3 | |
| 5656 | 1 | < 0.1% |
| 5310 | 1 | < 0.1% |
| 5168 | 1 | < 0.1% |
| 5088 | 1 | < 0.1% |
| 4933 | 2 | |
| 4872 | 1 | < 0.1% |
| 4850 | 1 | < 0.1% |
Outsideend
Real number (ℝ)
High correlation 
| Distinct | 3497 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 177.56816 |
| Minimum | 3 |
|---|---|
| Maximum | 13719 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.5 MiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 33 |
| median | 81 |
| Q3 | 185 |
| 95-th percentile | 732 |
| Maximum | 13719 |
| Range | 13716 |
| Interquartile range (IQR) | 152 |
Descriptive statistics
| Standard deviation | 295.40886 |
|---|---|
| Coefficient of variation (CV) | 1.6636364 |
| Kurtosis | 44.989715 |
| Mean | 177.56816 |
| Median Absolute Deviation (MAD) | 62 |
| Skewness | 4.7425366 |
| Sum | 5.0078341 × 108 |
| Variance | 87266.397 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 188617 | 6.7% |
| 4 | 120689 | 4.3% |
| 9 | 98390 | 3.5% |
| 14 | 67965 | 2.4% |
| 38 | 30526 | 1.1% |
| 19 | 30140 | 1.1% |
| 33 | 24236 | 0.9% |
| 30 | 22163 | 0.8% |
| 28 | 21021 | 0.7% |
| 32 | 20753 | 0.7% |
| Other values (3487) | 2195732 |
| Value | Count | Frequency (%) |
| 3 | 188617 | |
| 4 | 120689 | |
| 5 | 15436 | 0.5% |
| 6 | 570 | < 0.1% |
| 8 | 62 | < 0.1% |
| 9 | 98390 | |
| 10 | 3333 | 0.1% |
| 11 | 224 | < 0.1% |
| 12 | 5041 | 0.2% |
| 14 | 67965 | 2.4% |
| Value | Count | Frequency (%) |
| 13719 | 1 | |
| 13225 | 1 | |
| 9455 | 1 | |
| 9097 | 1 | |
| 8731 | 1 | |
| 8300 | 1 | |
| 7942 | 1 | |
| 7748 | 1 | |
| 7656 | 2 | |
| 7655 | 2 |
Phage_source
Categorical
| Distinct | 13 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 142.1 MiB |
| MGV | |
|---|---|
| GPD | |
| TemPhD | |
| GOV2 | |
| CHVD | |
| Other values (8) |
Length
| Max length | 8 |
|---|---|
| Median length | 3 |
| Mean length | 3.8176186 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | RefSeq |
|---|---|
| 2nd row | RefSeq |
| 3rd row | RefSeq |
| 4th row | RefSeq |
| 5th row | RefSeq |
Common Values
| Value | Count | Frequency (%) |
| MGV | 830363 | |
| GPD | 741785 | |
| TemPhD | 437596 | |
| GOV2 | 384232 | |
| CHVD | 198934 | 7.1% |
| GVD | 80967 | 2.9% |
| RefSeq | 43567 | 1.5% |
| IGVD | 33306 | 1.2% |
| PhagesDB | 32227 | 1.1% |
| Genbank | 20549 | 0.7% |
| Other values (3) | 16706 | 0.6% |
Length
| Value | Count | Frequency (%) |
| mgv | 830363 | |
| gpd | 741785 | |
| temphd | 437596 | |
| gov2 | 384232 | |
| chvd | 198934 | 7.1% |
| gvd | 80967 | 2.9% |
| refseq | 43567 | 1.5% |
| igvd | 33306 | 1.2% |
| phagesdb | 32227 | 1.1% |
| genbank | 20549 | 0.7% |
| Other values (3) | 16706 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| G | 2091202 | |
| V | 1541926 | |
| D | 1527933 | |
| P | 1211608 | |
| M | 831386 | 7.7% |
| e | 577506 | 5.4% |
| h | 469823 | 4.4% |
| T | 451720 | 4.2% |
| m | 437596 | 4.1% |
| O | 384232 | 3.6% |
| Other values (18) | 1241638 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 10766570 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| G | 2091202 | |
| V | 1541926 | |
| D | 1527933 | |
| P | 1211608 | |
| M | 831386 | 7.7% |
| e | 577506 | 5.4% |
| h | 469823 | 4.4% |
| T | 451720 | 4.2% |
| m | 437596 | 4.1% |
| O | 384232 | 3.6% |
| Other values (18) | 1241638 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 10766570 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| G | 2091202 | |
| V | 1541926 | |
| D | 1527933 | |
| P | 1211608 | |
| M | 831386 | 7.7% |
| e | 577506 | 5.4% |
| h | 469823 | 4.4% |
| T | 451720 | 4.2% |
| m | 437596 | 4.1% |
| O | 384232 | 3.6% |
| Other values (18) | 1241638 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 10766570 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| G | 2091202 | |
| V | 1541926 | |
| D | 1527933 | |
| P | 1211608 | |
| M | 831386 | 7.7% |
| e | 577506 | 5.4% |
| h | 469823 | 4.4% |
| T | 451720 | 4.2% |
| m | 437596 | 4.1% |
| O | 384232 | 3.6% |
| Other values (18) | 1241638 |
Interactions
Correlations
| Expnumberfirst60AAs | ExpnumberofAAsinTMHs | Insideend | Insidestart | Length | Outsideend | Outsidestart | Phage_source | PredictedTMHsNumber | TMhelixend | TMhelixstart | TotalprobofNin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Expnumberfirst60AAs | 1.000 | 0.409 | -0.166 | 0.127 | -0.351 | -0.327 | -0.118 | 0.046 | 0.355 | -0.143 | -0.163 | 0.156 |
| ExpnumberofAAsinTMHs | 0.409 | 1.000 | 0.512 | 0.728 | 0.268 | 0.322 | 0.584 | 0.047 | 0.872 | 0.692 | 0.672 | 0.089 |
| Insideend | -0.166 | 0.512 | 1.000 | 0.783 | 0.516 | 0.114 | 0.286 | 0.014 | 0.530 | 0.662 | 0.655 | -0.259 |
| Insidestart | 0.127 | 0.728 | 0.783 | 1.000 | 0.285 | 0.114 | 0.250 | 0.012 | 0.780 | 0.660 | 0.655 | -0.190 |
| Length | -0.351 | 0.268 | 0.516 | 0.285 | 1.000 | 0.738 | 0.439 | 0.018 | 0.260 | 0.480 | 0.484 | -0.011 |
| Outsideend | -0.327 | 0.322 | 0.114 | 0.114 | 0.738 | 1.000 | 0.728 | 0.018 | 0.306 | 0.607 | 0.619 | 0.240 |
| Outsidestart | -0.118 | 0.584 | 0.286 | 0.250 | 0.439 | 0.728 | 1.000 | 0.017 | 0.620 | 0.761 | 0.763 | 0.288 |
| Phage_source | 0.046 | 0.047 | 0.014 | 0.012 | 0.018 | 0.018 | 0.017 | 1.000 | 0.047 | 0.012 | 0.012 | 0.028 |
| PredictedTMHsNumber | 0.355 | 0.872 | 0.530 | 0.780 | 0.260 | 0.306 | 0.620 | 0.047 | 1.000 | 0.676 | 0.678 | 0.102 |
| TMhelixend | -0.143 | 0.692 | 0.662 | 0.660 | 0.480 | 0.607 | 0.761 | 0.012 | 0.676 | 1.000 | 0.994 | 0.093 |
| TMhelixstart | -0.163 | 0.672 | 0.655 | 0.655 | 0.484 | 0.619 | 0.763 | 0.012 | 0.678 | 0.994 | 1.000 | 0.102 |
| TotalprobofNin | 0.156 | 0.089 | -0.259 | -0.190 | -0.011 | 0.240 | 0.288 | 0.028 | 0.102 | 0.093 | 0.102 | 1.000 |
Missing values
Sample
| Phage_ID | Protein_ID | Length | PredictedTMHsNumber | ExpnumberofAAsinTMHs | Expnumberfirst60AAs | TotalprobofNin | POSSIBLENterm | Insidesource | Insidestart | Insideend | TMhelixsource | TMhelixstart | TMhelixend | Outsidesource | Outsidestart | Outsideend | Phage_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NC_001330.1 | NP_039595.1 | 75 | 1 | 22.46252 | 22.46221 | 0.44551 | True | TMHMM2.0 | 33.0 | 75.0 | TMHMM2.0 | 10.0 | 32.0 | TMHMM2.0 | 1.0 | 9.0 | RefSeq |
| 1 | NC_001331.1 | NP_039601.1 | 30 | 1 | 19.48607 | 19.48607 | 0.86987 | True | TMHMM2.0 | 1.0 | 6.0 | TMHMM2.0 | 7.0 | 29.0 | TMHMM2.0 | 30.0 | 30.0 | RefSeq |
| 2 | NC_001331.1 | NP_039602.1 | 83 | 1 | 23.01476 | 5.44076 | 0.03037 | NaN | TMHMM2.0 | 80.0 | 83.0 | TMHMM2.0 | 57.0 | 79.0 | TMHMM2.0 | 1.0 | 56.0 | RefSeq |
| 3 | NC_001331.1 | NP_039603.1 | 82 | 2 | 43.70290 | 25.09221 | 0.99660 | True | TMHMM2.0 | 80.0 | 82.0 | TMHMM2.0 | 57.0 | 79.0 | TMHMM2.0 | 43.0 | 56.0 | RefSeq |
| 4 | NC_001331.1 | NP_039604.1 | 437 | 2 | 37.26104 | 19.30140 | 0.90531 | True | TMHMM2.0 | 436.0 | 437.0 | TMHMM2.0 | 418.0 | 435.0 | TMHMM2.0 | 27.0 | 417.0 | RefSeq |
| 5 | NC_001331.1 | NP_039606.1 | 424 | 1 | 19.07514 | 0.00018 | 0.83822 | NaN | TMHMM2.0 | 1.0 | 228.0 | TMHMM2.0 | 229.0 | 248.0 | TMHMM2.0 | 249.0 | 424.0 | RefSeq |
| 6 | NC_001332.1 | NP_039618.1 | 29 | 1 | 22.75097 | 22.75097 | 0.42203 | True | TMHMM2.0 | 28.0 | 29.0 | TMHMM2.0 | 5.0 | 27.0 | TMHMM2.0 | 1.0 | 4.0 | RefSeq |
| 7 | NC_001332.1 | NP_039619.1 | 33 | 1 | 21.27578 | 21.27578 | 0.26393 | True | TMHMM2.0 | 26.0 | 33.0 | TMHMM2.0 | 4.0 | 25.0 | TMHMM2.0 | 1.0 | 3.0 | RefSeq |
| 8 | NC_001332.1 | NP_039620.1 | 84 | 2 | 35.57871 | 21.53046 | 0.97217 | True | TMHMM2.0 | 77.0 | 84.0 | TMHMM2.0 | 59.0 | 76.0 | TMHMM2.0 | 31.0 | 58.0 | RefSeq |
| 9 | NC_001332.1 | NP_039622.1 | 365 | 1 | 23.10212 | 0.02499 | 0.42621 | NaN | TMHMM2.0 | 273.0 | 365.0 | TMHMM2.0 | 255.0 | 272.0 | TMHMM2.0 | 1.0 | 254.0 | RefSeq |
| Phage_ID | Protein_ID | Length | PredictedTMHsNumber | ExpnumberofAAsinTMHs | Expnumberfirst60AAs | TotalprobofNin | POSSIBLENterm | Insidesource | Insidestart | Insideend | TMhelixsource | TMhelixstart | TMhelixend | Outsidesource | Outsidestart | Outsideend | Phage_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2820222 | biochar_6172 | biochar_6172_12 | 56 | 2 | 44.39544 | 44.39544 | 0.25832 | True | TMHMM2.0 | 27.0 | 30.0 | TMHMM2.0 | 31.0 | 53.0 | TMHMM2.0 | 54.0 | 56.0 | STV |
| 2820223 | biochar_6173 | biochar_6173_8 | 64 | 2 | 45.35974 | 44.28917 | 0.29958 | True | TMHMM2.0 | 28.0 | 39.0 | TMHMM2.0 | 40.0 | 62.0 | TMHMM2.0 | 63.0 | 64.0 | STV |
| 2820224 | biochar_6173 | biochar_6173_10 | 174 | 4 | 95.78472 | 35.60953 | 0.69341 | True | TMHMM2.0 | 124.0 | 135.0 | TMHMM2.0 | 136.0 | 158.0 | TMHMM2.0 | 159.0 | 174.0 | STV |
| 2820225 | biochar_6173 | biochar_6173_11 | 29 | 1 | 20.66581 | 20.66581 | 0.51916 | True | TMHMM2.0 | 25.0 | 29.0 | TMHMM2.0 | 5.0 | 24.0 | TMHMM2.0 | 1.0 | 4.0 | STV |
| 2820226 | biochar_6173 | biochar_6173_16 | 89 | 1 | 24.42941 | 24.42612 | 0.84030 | True | TMHMM2.0 | 1.0 | 6.0 | TMHMM2.0 | 7.0 | 29.0 | TMHMM2.0 | 30.0 | 89.0 | STV |
| 2820227 | biochar_6175 | biochar_6175_6 | 52 | 2 | 36.49023 | 36.49023 | 0.34638 | True | TMHMM2.0 | 24.0 | 29.0 | TMHMM2.0 | 30.0 | 51.0 | TMHMM2.0 | 52.0 | 52.0 | STV |
| 2820228 | biochar_6175 | biochar_6175_10 | 222 | 1 | 22.77331 | 22.75221 | 0.84202 | True | TMHMM2.0 | 1.0 | 6.0 | TMHMM2.0 | 7.0 | 29.0 | TMHMM2.0 | 30.0 | 222.0 | STV |
| 2820229 | biochar_6175 | biochar_6175_14 | 106 | 1 | 19.39243 | 19.39099 | 0.54127 | True | TMHMM2.0 | 1.0 | 4.0 | TMHMM2.0 | 5.0 | 24.0 | TMHMM2.0 | 25.0 | 106.0 | STV |
| 2820230 | biochar_6180 | biochar_6180_1 | 812 | 1 | 21.67516 | 21.66885 | 0.97053 | True | TMHMM2.0 | 1.0 | 6.0 | TMHMM2.0 | 7.0 | 29.0 | TMHMM2.0 | 30.0 | 812.0 | STV |
| 2820231 | biochar_6180 | biochar_6180_5 | 683 | 1 | 19.28219 | 17.64365 | 0.87138 | True | TMHMM2.0 | 1.0 | 41.0 | TMHMM2.0 | 42.0 | 61.0 | TMHMM2.0 | 62.0 | 683.0 | STV |